Learning Efficient Representations for Keyword Spotting with Triplet Loss

نویسندگان

چکیده

In the past few years, triplet loss-based metric embeddings have become a de-facto standard for several important computer vision problems, most no-tably, person reidentification. On other hand, in area of speech recognition generated by loss are rarely used even classification problems. We fill this gap showing that combination two representation learning techniques: embedding and variant kNN instead cross-entropy significantly (by 26% to 38%) improves accuracy convolutional networks on LibriSpeech-derived LibriWords datasets. To do so, we propose novel phonetic similarity based mining approach. also improve current best published SOTA Google Speech Commands dataset V1 10+2 -class about 34%, achieving 98.55% accuracy, V2 10+2-class 20%, 98.37% 35-class over 50%, 97.0% accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New efficient fillers for unlimited word recognition and keyword spotting

This paper describes our complete results for improved lexical llers as well as two new kinds of llers, gives their results in unlimited speech recognition as well as for keyword spotting and compares them to the acoustic-phonetic ller in the case of keyword spotting. Tests have been conducted on di erent vocabularies derived from ATIS and the Wall Street Journal database. Results for keyword s...

متن کامل

An Efficient Keyword Spotting Techni Language for Filler Mo

The task of keyword spotting is to detect a set of keywords in the input continuous speech. In a keyword spotter, not only the keywords, but also the non-keyword intervals must be modeled. For this purpose, filler (or garbage) models are used. To date, most of the keyword spotters have been based on hidden Markov models (HMM). More specifically, a set of HMM is used as garbage models. In this p...

متن کامل

Deep Residual Learning for Small-Footprint Keyword Spotting

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google’s previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models th...

متن کامل

Morphological Segmentation for Keyword Spotting

• We explore the impact of morphological segmentation on Keyword Spotting (KWS). ! • Handling out-of-vocabulary (OOV) words is a major challenge in KWS we aim to alleviate this problem by utilizing sub-word units.! • We augment a state-of-the-art KWS system with subword units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentatio...

متن کامل

Discriminative keyword spotting

This paper proposes a new approach for keyword spotting, which is not based on HMMs. The proposed method employs a new discriminative learning procedure, in which the learning phase aims at maximizing the area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters. The keyword spotter we devise is based on nonlinearly mapping the input acoustic representat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-87802-3_69